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ABSTRACT 

An  algorithm  is  presented  for  adaptively  partitioning  a  multidimen¬ 
sional  coordinate  space  based  on  optimization  of  a  scalar  function 
of  the  coordinates.  The  goal  is  to  construct  a  set  of  hyperrectangu- 
lar  regions,  such  that  the  variation  of  function  values  within  each 


region  is  small.  These  regions  are  then  used  as  the  basis  for  a 
stratified  sampling  estimate  of  the  definite  integral  of  the  function. 


1 .  INTRODUCTION 


Although  the  numerical  evaluation  of  multiple  integrals  has  received 
considerable  attention,  it  remains  a  difficult  problem.  Most  general- 
purpose  techniques  today  apply  to  integrands  that  are  relatively  "well 
behaved"  in  that  the  integrand  can  be  reasonably  well  approximated  by  a 
low  order  polynomial  within  the  region  of  integration.  One  technique 
for  trying  to  deal  with  integrands  that  are  not  so  "well  behaved"  is  to 
partition  the  region  of  integration  "adaptively"  into  subregions,  chosen 
so  that  the  integrand  is  well-behaved  within  each  subregion  [Hal ton  and 
Zeidman,  1971;  Lautrup,  1971,  Genz,  1972;  Kahaner  and  Wells,  1979; 

Sasaki,  1978;  LePage  1978].  Standard  techniques  can  then  be  applied  to 
evaluate  the  integral  in  each  subregion,  and  the  integral  over  the  entire 
region  is  taken  as  the  sum  of  the  integrals  over  the  subregions. 

Such  partitioning  can  sometimes  be  carried  out  manually  for  inte¬ 
grands  of  simple  functional  form  with  few  variables,  but  this  is  usually 
not  possible  for  complicated  or  large-dimensional  integrals.  Several 
automatic  partitioning  strategies  have  recently  been  proposed,  based  on 
estimated  properties  of  the  integrand.  Some  of  these  strategies  rely  on 
factored  approximations  [Lautrup,  1971;  LePage,  1978;  Sasaki,  1978]  while 
the  others  are  general  multidimensional  adaptive  procedures  [Hal ton  and 
Zeidman,  1971;  Genz,  1972;  Kahaner  and  Wells,  1979]. 

All  of  these  adaptive  techniques  (as  well  as  the  one  presented  here) 
are  iterative,  and  are  based  on  top-down  successive  refinement.  At  each 
iteration,  a  particular  region  is  considered  (initially,  the  entire  in¬ 
tegration  region).  A  sampling  of  the  integrand  within  the  region  is 


used  to  estimate  various  properties  of  the  integrand,  which  then  guide 
a  strategy  for  division  into  several  subregions.  This  process  continues 
until  the  partitioning  has  achieved  some  specified  reduction  in  the  es¬ 
timated  error  of  the  approximate  integral . 

The  effectiveness  of  a  partitioning  strategy  depends,  in  large  part, 
upon  the  degree  to  which  the  properties  of  the  integrand  deduced  during 
the  sampl ing, accurately  reflect  the  characteristics  of  the  function.  If 
the  function  is  badly  behaved,  its  predicted  and  actual  behavior  may  not 
even  resemble  one  another,  which  may  lead  to  ineffective  or  counterpro¬ 
ductive  partitions.  This  possibility  is  especially  likely  in  high  dim¬ 
ensions  where  even  samplings  of  large  cardinality  are  very  sparse  (for 
example,  in  ten  dimensions  a  sampling  of  60,000  points  is  equivalent  to 
about  three  points  per  coordinate).  On  the  other  hand,  a  complex  par¬ 
titioning  strategy  that  requires  a  very  large  number  of  integrand  evalu¬ 
ations  may  be  inefficient  because  the  additional  evaluations  might  be 
better  expended  simply  to  increase  the  number  of  points  used  to  compute 
the  final  integral  estimate.  Thus,  a  good  partitioning  strategy  must  be 
based  on  a  computationally  feasible  way  of  assessing  the  integrand's  be¬ 
havior. 

The  main  d-'stinctions  of  the  new  partitioning  strategy  from  previous¬ 
ly  proposed  methods  are: 

(1)  The  behavior  of  the  integrand  within  a  region  is  estimated 
via  mul ti parameter  optimization  rather  than  sampling; 

(2)  All  subregions  are  defined  by  simple  bounds  on  the  co¬ 
ordinates. 
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2.  OVERVIEW  OF  THE  ADAPTIVE  REFINEMENT  PROCEDURE 


Consider  a  hyperrectangular  region  R,  defined  by  simple  bounds  on 
each  coordinate: 

R  =  t x  1  *  x.  s  x1!}  (1) 

where  x  is  the  vector  (x^,  x2,  ...  xn)^* 

The  essence  of  an  adaptive  strategy  for  partitioning  R  can  be  specified 
by  three  attributes: 

(1)  A  measure  s(R)  that  indicates  the  "badness"  of  the  inte¬ 
grand's  behavior  within  R; 

(2)  A  method  for  subdividing  the  region  after  s(R)  has  been  de¬ 
termined; 

(3)  A  procedure  for  processing  the  new  subregions  and  for 
terminating  the  partitioning. 

The  quantity  used  in  the  new  algorithm  to  characterize  the  integrand 
is  the  difference  of  extreme  values  within  R,  weighted  by  the  volume  of 
R.  Let 

v(R)  =  max  f(x)  -  min  f(x),  (2) 

x€R  x€R 

where  f(x)  is  a  scalar-valued  function  (presently,  the  integrand  func¬ 
tion).  The  spread  s(R)  is  then  defined  by: 

s(R)  =  v ( R)  •  vol(R).  (3) 

The  spread  measure  s(R)  bounds  the  uncertainty  of  a  quadrature  or  Monte 
Carlo  estimate  of  the  integral  over  R,  and  is  taken  to  indicate  the  con¬ 
tribution  of  R  to  the  uncertainty  of  a  global  estimate  of  the  integral. 

The  choice  of  the  measure  (3)  depends  in  two  crucial  ways  on  the 
simply-bounded  form  (1)  of  R.  First,  the  volume  of  such  a  region  is 
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easy  to  compute: 


vol (R)  =  n  (x!J  -  x\). 
i=l  1  1 

This  would  not  be  true  if  more  complicated  regions  were  allowed.  The 
other  term  (2)  may  seem,  at  first  glance,  to  be  computationally  intract¬ 
able  since  two  optimization  problems  must  be  solved  to  calculate  it. 
However,  methods  for  optimization  with  simple  bounds  on  the  variables 
are  well  developed  and,  thus,  the  sub-problems  associated  with  (2)  can 
be  solved  quite  efficiently  if  f  is  a  reasonable  function.  Section  3 
gives  some  details  of  the  optimization  procedure. 

Given  that  (3)  is  the  spread  measure,  the  second  element  of  the 
partitioning  algorithm  involves  dividing  the  single  region  (1)  into  dis¬ 
joint  simply-bounded  subregions.  The  strategy  for  subdivision  is  based 
on  the  assumption  that  the  same  quadrature  rule  will  be  applied  in  each 
subregion  at  the  conclusion  of  the  partitioning  so  that  the  aimed-for 
final  result  is  a  list  of  regions  with  "similar"  spread  measures.  Sec¬ 
tion  4  presents  the  method  by  which  a  given  region  is  refined  to  achieve 
this  goal. 

Finally,  after  R  has  been  partitioned,  the  daughter  subregions  are 
merged  into  the  list  of  all  regions.  If  the  global  stopping  criteria 
are  satisfied,  the  partitioning  terminates.  Otherwise,  the  list  of 
regions  is  scanned  for  the  one  with  the  largest  spread  measure,  which  is 
then  considered  for  refinement  at  the  next  iteration.  Details  of  this 
aspect  of  the  algorithm  are  given  in  Section  5. 

Figure  1  illustrates  the  partitioning  achieved  by  applying  this  re¬ 
cursive  partitioning  procedure  to  the  function 
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f(x1,x2)  =  exp  {-15[x2  +  (x2  -  0.5)2]} 

+  exp  {-1 5[(x]  +  0.433)2  +  (x2  +  0.25)2]} 

+  exp  {-1 5[ ( x-j  -  0.433)2  +  (x2  +  0.25)2]} 

with  -1  <;  x-j  s  1  and  -1  s  x2s  1. 

Figure  la  shows  an  isometric  representation  of  the  surface  defined  by 
y  =  f(x.j,x2).  Figure  lb  displays  some  isopleths  of  the  function  on  the 
plane,  and  Figure  lc  shows  the  partitioning  of  the  plane  achieved  by 
applying  the  above  procedure  recursively,  in  this  case  creating  eleven 
subregions.  The  numbers  indicate  the  order  in  which  the  corresponding 
cuts  were  made. 

3.  OPTIMIZATION  WITH  SIMPLY-BOUNDED  VARIABLES 

To  compute  the  spread  measure  (3)  at  each  step  of  the  partitioning 
procedure,  it  is  necessary  to  solve  two  bounds-constrained  optimization 
problems  of  the  form: 

Ml :  min  f(x) 

subject  to  x'-  s  x  s  x^ 

M2:  max  f(x) 

subject  to  x1-  s  x  s  xl 

where  the  scalar-valued  function  f  drives  the  partitioning;  and  the  vec¬ 
tors  xL  and  xU  contain,  respectively,  the  lower  and  upper  bounds  that  de 
fine  the  desired  region. 

The  problem  M2  can  be  treated  as  a  minimization  problem  involving 
(-f(x)),  and  therefore  all  subsequent  discussion  will  concern  minimiza- 
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tion  only. 

In  a  typical  quadrature  problem,  f(x)  will  be  twice  continuously 
differentiable,  or  at  least  will  be  non-smooth  only  at  isolated  points. 

The  algorithm  selected  to  solve  problem  Ml  should  consequently  be  able 
to  perform  well  on  a  smooth  function.  However,  in  most  instances,  the 
derivatives  of  f  will  not  be  available  so  that  the  method  of  choice  should 
require  function  values  only.  Based  on  these  considerations,  the  optimi¬ 
zation  method  used  in  the  partitioning  algorithm  is  a  bounds-constrained 
quasi-Newton  method  with  finite-difference  approximations  to  first  deriv¬ 
atives. 

Quasi-Netwon  methods  for  unconstrained  optimizations  have  a  remark¬ 
able  history,  beginning  with  Davidon  [1959],  and  Fletcher  and  Powell  [1963]; 
a  recent  summary  of  their  motivation  and  properties  is  given  by  Dennis  and 
More  [1977].  Quasi-Newton  methods  have  been  extremely  successful  on  a 
wide  variety  of  problems;  if  properly  implemented,  they  are  quite  robust 
and  usually  display  superl inear  convergence.  The  idea  of  a  quasi-Newton 
method  is  to  build  up  second-order  information  about  the  function  to  be 
minimized,  by  incorporating  the  observed  changes  in  the  gradient  into  a 
matrix  that  approximates  the  underlying  matrix  of  second  partial  deriva¬ 
tives  (Hessian  matrix),  so  that  the  method  should  eventually  behave  like 
Newton's  method. 

A  typical  iteration  of  an  unconstrained  quasi-Newton  method  begins 
with  the  current  iterate,  x;  the  gradient  vector  of  f,  g;  and  an  approxi¬ 
mation  to  the  Hessian,  the  matrix  B. 

(i)  If  the  norm  of  the  gradient  is  sufficiently  small,  the  pro¬ 
cedure  terminates.  Otherwise,  proceed  to  step  (ii). 


9 


(ii)  Solve  the  linear  system 
Bp  =  -g 

for  the  search  direction,  p.  In  practice,  numerical  sta¬ 
bility  is  insured  by  using  a  Cholesky  factorization  of  the 
matrix  B,  so  that  p  is  always  a  direction  of  descent  for  f. 

This  essential  feature  is  due  to  Gill  and  Murray  [1972]. 

(iii)  Find  a  steplength  a  >  0  that  yields  a  sufficient  decrease  in 
f,  so  that 

f(x  +  ap)  <  f(x). 

The  steplength  algorithm  used  in  the  current  procedure  is 
the  safeguarded  quadratic  interpolation  procedure  imple¬ 
mented  by  Gill  and  Murray  [1974a]. 

( i v )  Evaluate  the  gradient  at  x  +  ap,  and  produce  an  updated 
Hessian  approximation  by  modifying  the  Cholesky  factori¬ 
zation  of  B  with  the  BFGS  quasi-Newton  update  [see  Gill 
and  Murray,  1974b].  Return  to  step  (i)  with  x  +  ap  as 
the  next  iterate. 

In  the  present  algorithn,  it  is  assumed  that  the  analytic  gradient  of 
f  is  not  available,  so  that  the  calculation  of  the  vector  g  is  carried 
out  using  finite  differences. 

When  the  variables  are  constrained  to  be  between  simple  bounds,  the 
above  algorithm  can  be  modified  in  a  straightforward  manner.  At  each 
iteration,  it  is  determined  whether  a  given  variable  is  "free"  to  vary  or 
is  to  be  held  "fixed"  at  one  of  its  bounds.  After  this  decision,  the  un¬ 
constrained  algorithm  is  applied  with  the  following  changes: 
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(1)  The  gradient,  direction  of  search,  and  approximate  Hessian 
represent  the  free  variables  only; 

(2)  The  steplength  in  step  (iii)  may  need  to  be  restricted  to 
prevent  a  free  variable  from  violating  a  bound  during  an 
iteration.  In  this  case,  the  variable  subsequently  becomes 
fixed  on  that  bound. 

(3)  The  updates  to  B  involve  only  the  free  variables. 

(4)  The  test  for  convergence  in  (i)  is  based  on  the  norm  of  the 
gradient  with  respect  to  the  free  variables.  When  this 
quantity  is  sufficiently  small,  it  is  necessary  to  check 
whether  freeing  any  variable  currently  held  fixed  on  its 
bound  will  lead  to  a  reduction  in  f.  This  determination 

is  made  by  checking  the  sign  of  the  gradient  with  respect 
to  all  fixed  variables  -  e.g.,  if  the  ith  variable  is  fixed 
on  a  lower  bound  and  the  ith  component  of  the  gradient  is 
negative,  the  ith  variable  can  be  released  from  its  bound. 

The  many  additional  details  of  the  algorithm  are  given  in  full  in 
the  software  documentation  [Friedman  and  Wright,  1979],  including  user- 
controlled  tolerances  that  define,  for  example,  "sufficiently  small"  in 
step  (i). 

Before  beginning  to  solve  problems  Ml  and  M2,  the  function  f  is  eval 
ated  at  a  random  sample  of  points  (say,  50)  in  R,  and  the  point  with  the 
smallest  (largest)  function  value  is  used  as  the  initial  point  for  the 
minimization  (maximization).  Although  for  some  regions  the  extrema  com¬ 
puted  at  previous  stages  could  be  used  as  part  of  the  initial  sample, 
this  information  is  not  retained  in  order  to  improve  robustness.  Espe- 


cially  in  high  dimensions,  convergence  to  a  spurious  saddle  point  might 
preclude  any  further  search  for  the  true  extremum  since  the  convergence 
criteria  would  be  satisfied  at  the  initial  point. 

An  additional  feature  of  the  algorithm  that  is  designed  to  improve 
robustness  is  a  "local  search",  to  be  used  if  the  gradient  is  small  at 
the  initial  point.  The  idea  is  again  to  avoid  a  spurious  indication  of 
convergence  at  a  saddle  point.  The  details  of  the  local  search  are  rather 
complicated  and  only  the  general  idea  will  be  sketched.  First,  a  point 
perturbed  from  the  initial  point  is  generated  by  moving  a  small,  feasible 
amount  along  each  coordinate  until  the  function  value  changes  sufficiently. 
Next,  a  feasible  descent  direction  is  constructed  at  whichever  point  is 
lower  and  an  exact  line  search  is  carried  out  along  that  direction.  Then, 
a  second  feasible  descent  direction,  orthogonal  to  the  first,  is  generated 
at  the  lowest  point  found  and  a  second  exact  line  search  is  performed. 

If  this  process  fails  to  yield  a  "sufficiently  lower"  function  value, 
the  initial  point  is  accepted;  otherwise,  the  quasi-Newton  procedure  be¬ 
gins  at  the  new  point. 

This  local  search  cannot  be  guaranteed  to  move  away  from  a  saddle 
point  since  the  function  is  evaluated  only  at  a  finite  number  of  points 
along  two  directions.  In  practice,  the  local  search  has  been  quite  suc¬ 
cessful  on  all  the  examples  tested. 
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4.  REFINEMENT  INTO  SUB-REGIONS 

In  this  section,  we  shall  be  concerned  with  a  particular  region,  R, 
defined  by  (1).  Let  xmax  and  xmin  denote  the  points  in  R  where  f  achieves 
its  maximum  and  minimum,  respectively,  with  fmax  and  fmin  the  correspond¬ 
ing  function  values.  For  simplicity,  we  assume  that  there  are  no  other 
local  extrema  in  R;  the  implemented  procedure  contains  provisions  to 
handle  situations  where  this  assumption  is  not  satisfied. 

A  partitioning  strategy  that  allowed  completely  general  regions 
might  divide  R  into  two  disjoint  parts  with  equal  spread  measures,  as 
follows.  Suppose  that  for  a  given  value  f,  f™111  s  f  s  fmax,  one  could 
determine  an  isoplethic  surface  (xjf(x)  =  f}  that  separates  R  into  two 
parts,  Rmax  (which  contains  xmax)and  Rmin  (which  contains  xmin)-  Under 
the  uniqueness  assumption,  the  differences  of  extreme  function  values 
within  Rmax  and  Rmin,  respectively,  would  be  (f™ax  >  ?)  and  (f  -  fmi*n). 

The  desired  choice  for  f  would  make  the  spread  measures  associated  with 
Rmax  and  Rmin  equal,  i.e., 

(f"1^  -  f)  vol  ( Rmax )  =  (f  -  fmin)  vol  (Rmin) .  (4) 

The  strategy  just  described  is,  of  course,  impractical  since  the  de¬ 
termination  of  the  isopleths  would,  in  general,  be  an  extremely  complex 
numerical  procedure.  Since  the  resulting  subregions  Rmax  and  Rmin  would 
no  longer  be  defined  in  general  by  simple  bounds,  the  next  optimization 
problem  would  also  be  much  more  complicated  to  solve.  Furthermore,  it 
would  be  much  more  difficult  to  compute  the  volume  of  such  a  general 
region. 
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The  strategy  adopted  in  the  present  algorithm  divides  the  region  R 
into  a  collection  of  simply-bounded  subregions  by  constructing  an  axis- 
oriented  hyperrectangular  approximation  to  either  Rmax  or  Rmin.  Either 
fmax  or  f"”11  is  selected  as  the  "major"  extremum  (fM),  i.e.,  that  for 
which  the  corresponding  function  value  is  farthest  from  the  mean  function 
value  f  in  R  (f  is  the  average  of  function  values  at  the  initial  random 
sample  of  points). 

If  f"‘X  *  f*li"  *  f .  then  fM  =  f"“,  f”  =  f"1";  otherwise. 
fM  =  f"1™,  f"1  =  f"1^  (with  the  corresponding  choices  for  xM,  xm). 

M  M 

We  then  seek  to  define  a  region  R  containing  x  by  two  sets  of 
"cuts"  (fi+,  6“)  along  the  positive  and  negative  coordinate  directions 
from  x*S 

RM  «  { x |  xl  -  6:  s  Xl  s  x”  +  6t), 

5j  i  0,  i  s  1,  n 
with  6+,  6”  chosen  such  that 
f(xM+  fit  e.)  =  f 

1  1  (5) 

f(xM  -  fij  e.)  =  f,  i  =  1,  n 

where  e^  is  the  ith  column  of  the  identity  matrix. 

The  equation  to  be  satisfied  by  f  is  a  re-arrangement  of  (4) 


vfM  +0^)^  =  f, 

where  Y  =  vol(R^)/vol(R). 

M 

Because  R  is  defined  by  simple  bounds, 


(6) 
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vo1(rm)  =  n  (at  +  6T). 

i=i  1  1 


Thus,  the  vectors  6  ,  6“  are  the  solution  of  the  2n  nonlinear  equations: 


n 

n 


(6t  +  6") 

f(x>l  *  6i  *i>  -  vd(R?  - 


n 

n 

i=l 


(6T  +  6") 


vol  (R) 


_]f^ 


(7) 


n  n 

n  (6.  +  6T)  n  (6.  +  6j 

f(xM  -  6:  e.)  =  ~VoT(r) -  f”  +  [1  -  —  ~oT( Rj — -Jf"1. 

i=l ,  ...»  n. 

Several  considerations  affect  the  choice  of  solution  method  for  the 
nonlinear  system  (7).  It  is  undesirable  to  expend  too  much  effort  in 
solving  (7)  since  the  solution  need  not  be  computed  with  very  high  accu¬ 
racy.  This  means  that  a  Newton-type  method  based  on  standard  finite 
differences  is  unacceptable  because  of  the  2n  function  evaluations  re¬ 
quired  at  every  iteration  to  compute  the  Jacobian.  A  reasonable  altern¬ 
ative  is  to  use  a  secant-type  method,  where  the  elements  of  the  Jacobian 
are  approximated  by  differencing  the  function  values  from  the  previous 
iteration.  The  switch  to  a  secant  method  is  worthwhile  because  such 
methods  display  local  superl inear  convergence,  and  are  typically  as 
effective  as  a  Newton-type  method  in  moving  from  a  poor  initial  estimate 
of  the  solution  to  a  reasonably  good  one  (which  is  all  that  is  required 
in  this  case)  [Mor£,  1977]. 

Eyen  a  secant  method  for  solving  (7)  could  be  considered  objection¬ 
able  because  it  requires  the  solution  of  a  2n  by  2n  linear  system  at  each 
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iteration.  However,  the  special  form  of  (7)  allows  it  to  be  transformed 
to  an  equivalent,  but  simpler,  nonlinear  system. 

The  right-hand  side  of  (7)  is  a  vector  whose  components  are  all 
equal,  and  hence  the  vectors  6+,  6"  also  satisfy  the  2n  nonlinear  equa¬ 
tions 

f(xM  +  6|  e7)  -  f(xM  +  6^  e2)  =  0 
f(xM  +  6^  e2)  -  f(xM  +  e3)  =  0 

(8) 


f(xM+^e1)-f  =0 

The  attractive  feature  of  the  system  (8)  is  that  since  each  equation  (ex¬ 
cept  for  the  last)  involves  only  two  adjacent  unknowns,  the  Jacobian  dis¬ 
plays  the  following  special  structure: 

x  x  0  .  .  .  .0 

0  x  x  0  .  .  .0 

0  0  x  x  0  .  .0 

(9) 

0  0  0.  .  .  .x 
xxx.  .  .xx 

If  no  interchanges  are  necessary,  the  matrix  (9)  will  be  reduced  to 
upper  triangular  form  very  easily,  by  simply  subtracting  multiples  of 


each  successive  row  from  the  last.  This  means  that  solving  the  linear 
system  at  each  iteration  is  extremely  fast. 

Numerous  safeguards  are  included  in  the  secant  procedure  -  in  par¬ 
ticular,  each  variable  6*,  6T  is  constrained  to  remain  within  the  range 
where  the  solution  must  lie,  and  the  norm  of  the  vector  that  is  the  left- 
hand  side  of  (8)  is  required  to  decrease  at  every  iteration. 

For  simplicity  of  exposition,  certain  complications  were  not  in¬ 
cluded  in  the  preceding  discussion  of  the  partitioning  strategy  -  in 
particular,  the  fact  that  not  all  possible  directions  are  considered  as 
candidates  for  cuts.  Since  the  bookkeeping  overhead  for  any  cut  is  the 

same,  it  is  prudent  to  disregard  cuts  that  appear  to  be  ineffective. 

M 

Certain  directions  are  eliminated  for  two  reasons.  First,  x  may  be  very 

close  to  an  upper  or  lower  bound,  so  that  the  cut  would  be  insignificant. 

Therefore,  no  cut  is  made  along  the  ith  positive  direction  if 

U  M  ,  U  L, 

X1  '  xi  5  6<xi  '  xi  >• 

nor  along  the  ith  negative  direction  if 

x,M-*,LsS(xi“-xjL), 

where  0  <  p  <  1  (currently,  p  =  .05). 

Furthermore,  the  solution  values  for  6*  are  constrained  to  satisfy 

0  <  s  <*(x.U  -  x.M) 

1  1  1  (10) 

0  <  6^  s  afx^  -  xiL), 

where  0  <  a  <  1  (currently,  a  -  1/2).  Before  beginning  the  iteration 
procedure  to  solve  for  the  cuts,  f  is  evaluated  at  the  points  where 
6±,  i=l,  ....  n,  are  at  the  upper  bounds  given  by  (10);  the  values  of  Y 
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and  f  from  (6)  are  then  computed  at  this  initial  configuration.  Assume 
that  fM  =  f1113*  (a  similar  analysis  holds  when  f^  =  f™n),  and  consider 
a  possible  positive  cut  along  the  ith  coordinate.  By  assumption,  f  is 
monotonic  along  the  ith  coordinate  {moving  away  from  the  extremum)  so  that 
the  value  of  f  corresponding  to  the  maximum  6^  will  be  smaller  than  f  at 
any  other  sj  in  the  acceptable  range.  In  addition,  the  initial  f  will  be 
the  maximum  possible  value.  Thus,  if  f  at  the  initial  6T  exceeds  f  » 
there  can  be  no  solution  to  (7)  in  the  desired  range,  and  so  no  attempt 
will  be  made  to  find  (the  ith  upper  bound  remains  unaltered). 

The  above  procedure  is  carried  out  for  all  possible  directions.  It 
should  be  noted  that  if  any  direction  is  eliminated,  the  values  of  Y  and 
f  in  (6)  must  be  recomputed. 

5.  MULTIPLE  INTEGRATION 

The  result  of  applying  the  nested  refinement  procedure  described  in 
the  previous  three  sections  is  a  set  of  hyperrectangular  subregions  of  the 
domain  of  integration  which  are  mutually  exclusive  and  collectively  cover 
the  integration  region.  The  variation  of  integrand  values  within  the 
regions  has  been  designed  to  be  substantially  less  than  between  the 
regions.  Since  the  regions  are  mutually  exclusive,  the  results  can  be 
summed  to  form  the  global  integral  estimate. 

A  variety  of  methods  exist  for  evaluating  the  definite  integral  with¬ 
in  each  subregion.  (For  an  excellent  and  rather  complete  survey,  see 
Stroud,  1971).  These  methods  can  be  crudely  characterized  by  the  regu¬ 
larity  they  require  of  the  integrand  (and/or  its  derivatives  to  various 
orders)  and  their  accuracy  per  integrand  evaluation.  At  one  extreme  are 
the  Monte  Carlo  methods  [Halton,  1970]  which  usually  require  very  little 


of  the  integrand  (and  thus  are  quite  robust)  but  which  converge  rather 
slowly.  Monte  Carlo  methods  also  yield  a  simple  uncertainty  estimate. 

At  the  other  extreme  are  the  high  degree  quadrature  formulae  [Stroud, 
1971]  which  can  be  applied  only  to  very  regular  integrands,  but  which 
can  yield  high  accuracy  for  very  few  integrand  evaluations. 

Limited  experience  has  indicated  that  the  more  robust  methods  per¬ 
form  best  in  conjunction  with  this  partitioning  method.  Of  these,  the 
greatest  success  has,  so  far,  been  obtained  with  the  quasi  uniform  Monte 
Carlo  methods  of  Korobov  [1963].  Reasonable  success  has  also  been 
achieved  with  simple  pseudo  random  Monte  Carlo  methods  [Halton,  1970]. 

6.  EXAMPLES 

In  this  section,  we  attempt  to  illustrate  some  of  the  properties  of 
this  nested  refinement  procedure  for  multiple  integration  by  applying  it 
to  several  examples  presented  by  others  to  illustrate  their  integration 
procedures. 

The  integration  method  used  in  each  subregion  for  the  examples  be¬ 
low  is  (with  one  exception)  the  quasi  uniform  Monte  Carlo  method  of 
Korobov  [1963],  as  described  in  Stroud  [1971].  The  rate  of  convergence 
of  this  method  with  increasing  N  depends  upon  the  smoothness  of  the  in¬ 
tegrand,  but  it  is  never  slower  than  1/N.  This  method  (like  most  quad¬ 
rature  methods)  does  not  provide  a  simple  estimate  of  the  uncertainty 
associated  with  the  integral  evaluation  in  each  subregion.  It  has  been 
found  empirically  that  the  quantity 

°i  *  I VN 
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provides  a  reliable  (and  usually  quite  conservative)  estimate  of  the  un¬ 
certainty  associated  with  this  method.  Here  Si  is  the  spread  of  the  ith 
region,  N  is  the  number  of  sample  points,  and  the  factor  1/2  is  intro¬ 
duced  because  of  the  convention  of  reporting  uncertainty  as  a  symmetric 
(plus  or  minus)  half  value  about  the  estimate.  Since  the  integral  es¬ 
timates  in  each  subregion  are  independent,  the  total  uncertainty  a  is 
taken  as  the  square  root  of  the  sum  of  squares  of  the  individual  region 
uncertainties 


a 


1 

2N 


(ID 


Here  M  is  the  number  of  subregions. 

This  uncertainty  estimate  (11)  can  be  used  as  a  basis  for  termin¬ 
ating  the  partitioning.  At  a  given  stage  of  partitioning,  let  there  be 
M  subregions  and  Np(M)  integrand  evaluations.  If  one  wishes  to  estimate 
the  integral  with  (prespecified)  uncertainty  aQ,  then  from  (11) 


Nj(M)  = 


(12) 


integrand  evaluations  will  be  required  to  estimate  the  integral  in  each 
subregion.  Therefore,  the  total  number  of  integrand  evaluations  needed 
to  achieve  accuracy  Uq  if  the  partitioning  is  stopped  after  M  regions  is 
Nt(M)  =  Np(M)  +  Nj(H)-M  .  (13) 

As  the  partitioning  proceeds  (increasing  M)  Np(M)  increases  (approxi¬ 
mately  linearly)  while  Nj(M)  decreases  due  to  the  reduction  in  spread. 

This  reduction  tends  to  be  very  rapid  initially,  tapering  off  to  slow 
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reduction  for  large  M.  Therefore,  Ny(M)  tends  to  decrease  for  small 
(increasing)  M,  reaching  a  mimimum,  and  then  starts  to  slowly  increase. 

An  optimal  strategy  is  to  terminate  partitioning  at  the  point  at  which 
Ny(M)  achieves  its  minimum  value.  Since  it  is  not  possible  to  know  (in 
advance)  this  optimum  value,  we  terminate  the  partitioning  at  the  first 
point  for  which  Ny(M)  (13)  fails  to  decrease  for  several  (five)  suc¬ 
cessive  iterations.  Equation  (12)  is  then  used  to  determine  the  number 
of  evaluation  points  Nj(M)  to  perform  the  final  integration  in  each  sub- 
region. 

Table  1  shows  results  of  applying  this  procedure  to  a  series  of  in¬ 
tegrals  presented  by  LePage  [1978].  The  table  presents  the  answer  ob¬ 
tained  with  estimated  uncertainty,  the  total  number  of  evaluations  of 
the  integrand  (partitioning  plus  integral  evalation)  Ny,  the  number  used 
for  the  partitioning  stage  alone,  Np,  and  the  number  of  subregions  re¬ 
sulting  from  the  partitioning.  For  comparison,  the  results  presented  by 
LePage  for  both  his  method  and  a  Guass-Legendre  product  rule  are  also 
presented. 

LePage  employs  a  factored  approximation  of  the  form 

f(x)  =  n  f.(x.)  (14) 

i=l  1  1 

which  is  used  as  a  basis  for  pseudo  random  Monte  Carlo  importance  samp¬ 
ling  within  the  region  of  integration.  This  procedure  should  be  espe¬ 
cially  suitable  for  the  integrals  presented  in  Table  la  since  the  fac¬ 
tored  approximation  of  (14)  is  exact.  As  seen  in  Table  la,  it  considerably 
outperforms  the  one  presented  here  in  high  dimensionality.  However,  the 
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’p  "  i  'i'’X  “P  [',0°  ft,  (Xi  ‘  V2)2] 


=  1.0 

"  i  i 

This  Method 

LePaqe  [19781 

Gauss-Legendre 

0.999  ±  0.007 

0.994  ±  0.007 

.892 

Ny  *  7403 

Np  =  3851 

24  regions 

nt  =  10000 

nt  =  10000 

1.01  ±  .008 

1.001  ±  .005 

71.364 

Nt  =  277238 

Np  =  83626 

nt  =  100000 

Nt  =  2.0  x  106 

388  regions 

0.774 

nt  =  109 

TABLE  la 
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best  procedure  in  this  case  would  be  to  integrate  each  one-dimensional 
function  separately  and  then  form  the  combined  integral  as  the  product 
of  the  one-dimensional  integrals.  The  integrals  presented  in  Table  lb 
do  not  conform  exactly  to  the  factored  approximation  of  (14),  and  the 
comparison  for  these  cases  is  more  favorable  to  the  partitioning  method 
presented  here. 

Table  2  fhows  results  of  applying  this  procedure  to  four  integrals 
presented  by  Sasaki  [1979].  He  employs  a  factored  approximation  of  the 
form 

P-1 

f(x)  =  n  f.  (x.,x.  ,).  (15) 

i=l  1  1  1+1 

Each  function  f^(x^,x^+^  is  represented  by  an  adaptive  piecewise  con¬ 
stant  approximation  on  the  plane.  For  the  first  two  integrands  of  Table 
2,  the  factored  approximation  (15)  is  exact,  while  for  the  last  two,  it 
is  not.  The  method  presented  here  is  seen  to  perform  well  in  comparison 
to  that  of  Sasaki  for  these  integrands.  However,  as  Table  2  indicates, 
these  integrands  are  well  approximated  by  low  order  polynomials  and  a 
simple  Gauss-Legendre  product  rule  outperforms  both  methods  in  this  case. 

Table  3  shows  results  on  several  of  the  integrals  presented  by  Hal  ton 
and  Zeidman  [1971].  They  describe  a  nested  refinement  procedure  based  on 
successive  bisection.  The  procedure  described  in  this  report  was  moti¬ 
vated,  to  a  substantial  degree,  by  this  MCSS  (Monte  Carlo  Sequential 
Stratification)  technique.  Inspection  of  Table  3  indicates  that  the  op¬ 
timization  method  described  in  this  report  compares  favorably  with  the 
MCSS  procedure. 
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«•  v  wr  not**:-** 


Nt  =  2278  Nt  =  300000  Ny  =  2304 

Np  =  1339 
8  regions 


I4  0.998  ±  0.007  1.003  ±  0.006  .927 

Nt  =  10230  Nt  =  300000  Ny  =  10000 

N  =  4662 

r 

29  regions 


0.994  ±  0.005 

0.991  ±  0.007 

2.27 

Nt  =  190894 

Nt  =  2.4  x  106 

Nt  =  279936 

Np  =  43449 
99  regions 


1.03  ±  0.025 

0.96  ±  0.04 

240.08 

Nt  =  303228 

NT  =  1.5  x  106 

Nt  =  262144 

Np  =  151033 

0. 0065 

305  regions 

TABLE  lb 

Nt  =  1.95  x  106 
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This  Method 

S  a  s  a  * 
Method  1 

:  i  (1979) 

Method  2 

0.7193  ±  0.0005 

0.7182  ±  0.0005 

0.7194  ±  0.0003 

Nt  =  40526 

Np  =  7010 

48  regions 

Nt  =  7000 

Nt  =  84529 

Tb 

0.1740  ±  0.0004 

0.1718  ±  0.0004 

0.1721  ±  0.0004 

Nt  =  46656 

Np  =  6183 

Nt  =  70000 

Nt  =  84529 

66  regions 

Jc 

0. 5072  ±  0. 0004 

0.5005  ±  0.0004 

0.5017  ±  0.0004 

Ny  =  10676 

30  regions 

Nt  =  70000 

Nt  =  84529 

*d 

0.8193  ±  0.0008 

0.816  ±  0.00] 

0.8214  i  0.0008 

Nt  =  46663 

Nt  =  70000 

Nt  =  84529 

Gauss-Legendre 

.71902  Nt  =  15625 
.71902  Nt  =  46656 


.1723  Nt  =  46645 


Np  =  10798 
23  regions 


.502  Nt  =  15625 


.8208  Nt  =  46645 


TABLE  2 
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Integral  This  Method 

Ia  1.03±0.02(1) 

Nt  =  91096 

Np  =  50138 

273  regions 

Ib  0.018517  ±  0.68x10 

Nt  =  16237 
Np  =  7837 

27  regions 

I5  .999  ±  0.003 

Nt  =  15378 
Np  =  7816 

38  regions 

I10  1.017  ±0.007 

Nt  =  360609 
Np  =  73225 
184  regions 


Hal  ton  &  Zeidman  (1971) 
0.944  ±  0.029 
NT  =  205677 

0.018516  ±  0.11x1 0"4 
N?  =  120145 


0.96  ±  0.02 
Nt  =  105821 


0.90  ±  0.11 
Nt  =  453872 

TABLE  3 
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Gauss-Legendre 

.921 

Nt  =  59049 

. 01 969 
Ny  =  32768 


1.155 

NT  =  16807 


0.0076 

Nt  =  1.05xl06 
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7.  DISCUSSION 


The  results  of  the  previous  section  indicate  that  the  partitioning 
strategy  presented  here  can  be  useful  for  multiple  integration.  Although 
one  might  have  rejected  out  of  hand  (as  being  hopelessly  too  expensive) 
the  notion  of  applying  function  optimization  to  these  problems,  the  ex¬ 
amples  illustrate  that,  at  least  for  difficult  problems  in  high  dimension¬ 
ality,  this  is  not  the  case. 

The  purpose  of  applying  the  partitioning  is  to  divide  integration 
region  into  subregions,  such  that  the  behavior  of  the  integrand  within 
each  is  relatively  good  when  compared  to  its  behavior  over  the  entire 
integration  region.  In  this  context,  "bad  behavior"  is  ideally  defined 
as  error  associated  with  a  particular  integration  method.  Our  choice  of 
spread  (3)  as  such  a  measure  is  motivated  by  the  relative  ease  and 
reliability  with  which  it  can  be  estimated  (using  function  optimization), 
as  compared  to  other  properties  of  the  integrand  (such  as  the  variance) 
which  require  adequate  sampling  to  be  reliably  estimated.  The  partition¬ 
ing  procedure  will  be  most  effective  in  those  cases  for  which  the  spread 
measure  closely  corresponds  to  with  the  uncertainty  in  the  integral  es¬ 
timate. 

t 

The  main  objective  of  the  isoplethic  division  strategy  is  to  make 
finer  divisions  (locally)  in  those  directions  in  which  the  integrand  is 
most  rapidly  varying.  The  strategy  will  tend  to  accomplish  this  even 
if  the  resulting  hyperrec tangle  does  not  closely  correspond  to  a  function 
isopleth.  However,  the  resulting  reduction  in  spread  will  be  greater 
the  closer  the  approximation  is  to  a  function  isopleth.  Thus,  the  par¬ 
titioning  strategy  will  be  most  effective  when  the  function  isopleths 


tend  to  be  convex  over  the  bulk  of  the  integration  region  and  somewhat 
less  effective  to  the  extent  that  this  is  not  the  case.  (It  should  be 
noted  that  this  generally  j_s  the  case  for  the  examples  of  the  previous 
section.)  As  with  most  numerical  integration  methods,  this  method  will 
have  difficulty  with  highly  oscillatory  integrands. 

An  important  consequence  of  adopting  the  spread  as  an  indication  of 
difficulty  is  that  one  is  less  likely  to  be  deceived  into  thinking  that 
an  integral  estimate  is  accurate  when  it  is  not.  Undersampl ing  can 
cause  both  the  integral  and  its  associated  error  estimate  to  be  seriously 
undervalued.  This  tendency  is  more  pronounced  the  more  difficult  the 
problem.  Owing  to  the  fact  that  the  estimation  of  the  spread  does  not 
rely  on  sampling,  it  is  less  vulnerable  to  this  problem. 

The  memory  requirement  associated  with  the  method  is  not  severe. 

For  each  of  the  M  hyperrectangular  regions,  one  must  store  the  spread 
measure,  Si  (3),  and  the  region  boundaries  x1^,  x*:.  The  boundaries 
can  be  arranged  in  a  binary  tree  requiring  3M-2  integers  and  M-l  real 
quantities.  Thus,  in  all,  storage  for  3M-2  integers  and  2M-1  real  num¬ 
bers  is  required.  For  the  examples  of  the  previous  sections,  the  largest 
number  of  regions  was  M=38§.  Storage  for  several  thousand  regions  could 
easily  be  accommodated  on  most  medium  to  large  scale  computers. 

There  are  several  avenues  of  investigation  that  are  not  addressed 
in  this  report.  It  has  been  assumed  that  the  object  function  used  to 
drive  the  partitioning  was  identical  to  the  integrand.  This  is  not  a 
fundamental  requirement  and  different  choices  may  prove  to  be  useful. 

For  example,  if  several  integrals  are  to  be  evaluated  with  similar  inte¬ 
grands  over  the  same  region  of  integration,  it  might  be  that  a  partition- 
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ing  based  on  one  of  them  will  be  effective  for  integrating  all  of  them. 
Many  integration  formulae  have  the  property  that  they  are  exact  for 
linear  functions.  Thus,  within  any  region,  the  linear  component  of  the 
integrand  is  exactly  integrated  and  one  would  like  a  partitioning  of 
the  domain  of  integration,  such  that  the  range  or  spread  of  derivatives 
within  each  subregion  is  small.  An  object  function  of  the  form 


f,(x)  =2  | 

i=l  axi 


or  f'(x)  = 


&  Mr 


might  be  useful  for  driving  the  partitioning  in  these  cases. 

The  possibility  of  using  this  partitioning  method  in  conjunction 
with  other  adaptive  methods  might  also  be  considered.  Owing  to  its  ro¬ 
bustness,  this  procedure  might  be  applied  as  the  first  stage  of  a  com¬ 
bined  procedure.  In  those  subregions  for  which  the  spread  measure  is 
relatively  large,  one  could  apply  an  adaptive  procedure  based  on  sampling. 
The  methods  of  Genz  [1972],  LePage  [1978],  and  Kahaner  and  Wells  [1979] 
appear  as  good  candidates  for  this  combined  application. 

A  FORTRAN  program  [Friedman  and  Wright,  1979]  implementing  the  nested 
refinement  partitioning  procedure  described  in  this  report,  along  with 
several  numerical  integration  methods,  is  available  from  either  author 
upon  request. 
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